Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors
نویسندگان
چکیده
With the broad deployment of multicore processors, there are increasing demands to port OpenCL programs written for GPUs onto the multicore processors. However, OpenCL programs written for GPUs cannot run efficiently on multicore processors since GPU-oriented OpenCL programs generally consist of a huge number of threads. This paper presents experimental comparisons of three thread execution methods for GPU-oriented OpenCL programs on multicore processors using a set of industry-oriented OpenCL benchmark programs.
منابع مشابه
OpenCL for programming shared memory multicore CPUs
Shared memory multicore processor technology is pervasive in mainstream computing. This new architecture challenges programmers to write code that scales over these many cores to exploit the full computational power of these machines. OpenMP and Intel Threading Building Blocks (TBB) are two of the popular frameworks used to program these architectures. Recently, OpenCL has been defined as a sta...
متن کاملOpenCL on shared memory multicore CPUs
Shared memory multicore processor technology is pervasive in mainstream computing. This new architecture challenges programmers to write code that scales over these many cores to exploit the full computational power of these machines. OpenMP and Intel Threading Building Blocks (TBB) are two of the popular frameworks used to program these architectures. Recently, OpenCL has been defined as a sta...
متن کاملAlthough the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of t
Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementatio...
متن کاملPerformance Portability in Accelerated Parallel Kernels
Heterogeneous architectures, by definition, include multiple processing components with very different microarchitectures and execution models. In particular, computing platforms from supercomputers to smartphones can now incorporate both CPU and GPU processors. Disparities between CPU and GPU processor architectures have naturally led to distinct programming models and development patterns for...
متن کاملOpenCL Evaluation for Numerical Linear Algebra Library Development
With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for GPUs. The main three frameworks are: 1) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015